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Abstract 
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List decoding for arbitrarily varying channels (AVCs) under state constraints is investigated. It is 
CNJ \ shown that rates within e of the randomized coding capacity of AVCs with input-dependent state can 

be achieved under maximal error with list decoding using lists of size 0(1/ e). Under average error an 
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achievable rate region and converse bound are given for lists of size L. These bounds are based on two 
different notions of symmetrizability and do not coincide in general. An example is given that shows 
that for list size L the capacity may be positive but strictly smaller than the randomized coding capacity. 
This behavior is different than the situation without state constraints. 

I. Introduction 



> . 

^sO . The arbitrarily varying channel (AVC) is a model for communication subject to time-varying inter- 



ference [?]. The time variation is captured by a channel state parameter and coding schemes for these 



channels are required to give a guarantee on the probability of error for all channel state sequences. The 
AVC is thought of as an adversarial model in which the channel state is controlled by a jammer who 
q , wishes to foil the communication between the encoder and decoder. 



This short paper addresses the problem of list-decoding in an AVC when the state sequence is 
constrained. The constraint comes by imposing a per-letter cost l(-) on the state sequence and requiring the 
cost of the state sequence chosen by the jammer for n channel uses to be less than a total budget An. The 
randomized and deterministic coding capacity for this AVC variant was found by Csiszar and Narayan 
[?], [?]. In particular, they showed that the deterministic coding capacity under average error Cd(A) 
may be positive but strictly smaller than the randomized coding capacity C r (A). This is a qualitatively 
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different situation from AVCs without constraints [?], where Cd is either or equal to C r . They also 
showed that symmetrizability as defined by Ericson [?] is sufficient for Cd(A) to be positive [?]. 

In list-decoding, the decoder is allowed to output a list of L messages and an error is declared only if 
the list does not contain the transmitted message. For AVCs without constraints, list-decoding capacities 
have been investigated under both maximal and average error. For maximal error, Ahlswede [?], [?] found 
a quantity Cd cp such a rate Cd cp — e is achievable with lists of size 0(l/e). We extend this result to the 
situation with cost constraints and define a quantity Cd op (A) such that a rate Cd cp (A) — e is achievable 
under list-decoding with list size 0(l/e). This result on maximal error can be used to find the randomized 
coding capacity of AVCs where the state can depend on the transmitted codeword as well as rateless 
code constructions [?]. 

The average error list-L capacity Cl without constraints was found independently by Blinovsky, 
Narayan, and Pinsker [?], [?] and Hughes [?]. These authors defined the symmetrizability L sym of an 
AVC and showed that there is a constant list size L sym so that for L < L sym the list-L capacity is and 
for L > L S ym the list-L capacity is equal to the randomized coding capacity C r . We show that under state 
constraints the behavior is qualitatively different. The ability of the jammer to symmetrize the channel 
depends on the input distribution P and the cost constraint A. We define two kinds of symmetrizability for 
list-decoding under state constraints. We show that for list size L the coding strategy of Hughes [?] can 
be used with input distributions P such that L is larger than the weak symmetrizability L sym (P, A). We 
also prove a new converse for input distributions P such that L is smaller than the strong symmetrizability 

L S ym(P, A). 

In general, L sym (P, A) < L sym (P, A), which gives a gap between our achievable region and converse. 
Closing this gap seems non-trivial; we conjecture that the converse can be tightened. However, our results 
do imply a significant difference between the constrained and unconstrained setting. Without constraints, 
the list-L capacity Cl is either or equal to the randomized coding capacity C r . We show via a simple 
example that under cost constraints (analogous to [?]) the list-L capacity C\(A) may be positive but 
strictly smaller than the randomized coding capacity C r (A). 

II. Definitions and main results 

We will use calligraphic type for sets and [M] = {1,2,..., M} for integers M. For sets X and y, the 
set V(X) is the set of probability distributions on X, V n (X) is the set of all distributions of composition 
n, and V{y\X) is the set of all conditional distributions on y conditioned on X . For random variables 
(X, Y) with joint distribution Pxy we will write Px and Py for the marginal distributions and Px\y 
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for the conditional distribution of X given Y. For a distribution P G V(X m ) we will denote by Pj the 
i-th marginal of P. Let <i max (P, Q) be the maximum deviation (i^ distance) between two probability 
distributions P and Q. 

A. Channel model and codes 

An AVC is a collection of W = {W(-|-,s) : s G S} of channels from an input alphabet X to an 
output alphabet y parameterized by a state s G S, where all alphabets are finite. If x = (xi, X2, ■ ■ ■ , x n ), 
y = (yi, ?/2, ■ ■ ■ , Un) and s = (si, S2, ■ ■ ■ , s n ) are length n vectors, the probability of y given x and s is 
given by: 

n 

W(y|x,s) =Y[W{y i \x i ,s i ) . (1) 

i=i 

We are interested in the case where there is a bounded cost function / : S — > IR + on the jammer. The 
cost of an n-tuple is 

n 

l(s) = ^2l( 8k ) . (2) 
k=i 

The state obeys a state constraint A if 

Z(s) < nA a.s. . (3) 

An (n, A/", L) deterministic list code C for the AVC is a pair of maps (ijj, <p) where the encoding 
function is ^ : {!> 2, . . . , A^} — > Af n and the decoding function is : 3^ n -> {1, 2, . . . , A^} L . The rate 
of the code is R = \og(N/L). The codebook is the set of vectors {x« : 1 < % < N}, where Xj = 
The decoding region for message i is Di = {y : z G ^>(y)}. We will often specify a code by the pairs 
{(xj, : « = 1, 2, . . . , A^}, with the encoder and decoder implicitly defined. 

The maximal and average error probabilities el and are given by 

e L = max max (1 - WiDAX 71 = X;, s)) (4) 
se5"(A) i 

1 N 

^L= m^ -Y,a-W(D t \^,s)) . (5) 

se5"(A) A ^ 

A rate i? is called achievable under maximal (average) list-decoding with list size L if for any e > there 
exists a sequence of (n, N, L) list codes rate at least R — e whose maximal (average) error converges to 0. 
The list-L capacity is the supremum of achievable rates. We denote the list-L capacities under maximal 
and average error by Cx(A) and Cl(A), respectively. 



October 6, 2009 



DRAFT 



4 



B. Symmetrizability and information quantities 

We call a channel V(y\x\,X2, • • • , x m ) from X m to y symmetric if for any permutation ir on [m], 

V(y\x 1 ,x 2 , ...,x m ) = V(y\x n(1) ,x n{2 ), ■ ■ ■ ,x 7r(m) ) V(xi,x 2 , ...,x m ,y) . (6) 

A channel U (s\xi, x 2 , . . . , x m ) symmetrizes an AVC W if 

V(y\ X, X\ , . . . , x m 

) (7) 
is a symmetric channel. We denote by U sym (m) the set of channels which symmetrize W: 

£4ym("i) = {£/(s|x m ) : V(y|a;, zi, x m ) is symmetric} . (8) 

Note that U sym is a convex subset of channels U(s\xi, . . . , x m ) defined by equality constraints from ([6]>. 

For a distribution P 6 we define the strong symmetrizing cost \ m (P) to be the smallest expected 

cost of a channel U{s\x m ) that symmetrizes the AVC W whose input P(x m ) may be correlated but has 
marginals equal to P: 

Xm(P)= min max V V P(x m )U(s\x m )l(s) . (9) 

We call an AVC strongly m-symmetrizable under the constraint A if X m (P) < A. We define the strong 
symmetrizability L sym (P, A) of the channel under input P to be the largest integer m such that A m (P) < 
A. That is, 

L sym (P, A) = max {m : A m (P) < A} . (10) 

We define the weak symmetrizing cost A m (P) to be the smallest expected cost of a channel U(s\x m ) 
that symmetrizes the AVC W with independent inputs: 

A m (P) = min V^P m (x m )C/(,|x m )/( S ) , (11) 

U&U sym (m) g 

where P m is the product distribution P x P x • • • x P. We call an AVC weakly m-symmetrizable 
if A m (P) < A. Similarly, the weak symmetrizability L sym (P, A) is the largest integer m such that 
A m (P) < A. That is, 

Lsym (Pj -A) = max {m : A m (P) < A} . (12) 

For a fixed input distribution P(x) on A" and channel V(y\x), we will use the notation I (P,V) to 
denote the mutual information between the input and output of the channel: 
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We define the following two information sets: 



Q(A) = |q G : £ i(«)OW < A | (14) 

U(P,A) = S^UeV(S\X) :J2U(s\x)P(x)l(s) < aJ . (15) 



These in turn can be used to define two information quantities: 



C std (A) = p max ) ^ I \p, £ W(y\x, s)Q(s)\ (16) 
Cdep(A) = p max^min A) /^ . (17) 



C. Main results 



Our first result extends the strategy of Ahlswede to the case of constrained AVCs under maximal error. 
Theorem 1 (List decoding for maximal error): Let W be an arbitrarily varying channel with state cost 
function l(s) and cost constraint A. Then for any e > the rate 

R = C dcp (A) - e (18) 

is achievable under maximal error using list decoding with list size 

L = {j) ■ W 
Furthermore, the capacity Cx(A) under maximal error using list decoding with list size L is bounded: 

C d ep(A) - 0{L- 1 ) < C L (A) < C dcp (A) . (20) 
The proof is given in Appendix U This result can be used together with a message authentication 
strategy [?] to show that Cd ep (A) is the randomized coding capacity of AVCs with input-dependent state 
[?]• 

For average error we can show an achievable rate region and converse bound which in general do not 
coincide. Proofs of Theorems [2] and [3] are given in Appendix ITT1 In both cases the results constrain the set 
of input distributions in V{X). The intuition for the converse is that for any codebook with codewords of 
type P, the jammer can choose a symmetrizing channel U G U sym (L) such that the expected cost under 
any joint distribution with marginals equal to P is within the cost constraint. Operationally, the jammer 
chooses L codewords from the codebook and uses them as inputs to U to generate a state sequence s 
which satisfies the cost constraints. 
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Theorem 2 ( Converse for average error): Let W be an arbitrarily varying channel with state cost 
function /(•) and cost constraint A. Then we have the following upper bound on Cl(A): 

C L (A)< max min / ( P, V W(y\x, s)Q(s) ) . (21) 

For achievability we extend the coding strategy of Hughes [?] in a manner analogous to [?] to show 

an achievable rate for input distributions P such that L > L sym (P, A). 

Theorem 3 (Achievability for average error): Let W be an arbitrarily varying channel with state cost 

function /(•) and cost constraint A. Then we have the following lower bound on Cl(A): 

C L (A)> max min / ( P, V W(y\x, s)Q(s) ) . (22) 

PeV(xy.i syrn (P,A)<L QeQ(A) \ ^ J 

If P* is the maximizing input distribution for C st d(A), then for list size L > L sym (P*,A) we have 

C L (A) = C std (A) . (23) 

III. Example and discussion 

We will now show via an example that the behavior of list-decoding under average error with state 
constraints is qualitatively different from that without constraints. In particular when the jammer must 
satisfy a constraint A < oo, positive rates may be achievable with list sizes that are smaller than the 
unconstrained symmetrizability, and for a fixed list size the list-L capacity may be positive but strictly 
smaller than the randomized coding capacity. Let the input X = {0, 1}, state S = {0, 1, . . . , a} and the 
channel be defined by: 

Y = X + S . (24) 

We will consider a quadratic cost function l(s) = s 2 . 

Without constraints, Hughes [?] has found that the randomized capacity is 

7T 

CV(cxd) = - log cos — — . (25) 
a + 6 

He also showed that for unconstrained AVCs the list-L capacity obeys a strict threshold : 



Cl(oo) = { 



(26) 



log cos L > a 
L < a 

We are interested in the case when there is a cost constraint A on the jammer. We must calculate the 
minimum mutual information for different input distributions: 

/ (P, A) = min / (X A Y) . (27) 

QeP(S)-M Q [l{s)}<\ 



October 6, 2009 



DRAFT 



7 



The randomized-coding capacity under the cost constraint A is the max of / (P, A) over P. 

CJA) = max I(P,A) . (28) 

PeV(X) 

These calculations can be easily done numerically. 

To calculate the symmetrizability constraints, note that the because the channel (l24l is determinis- 
tic, the symmetry constraints imply that any channel U £ W sjm must also be symmetric. Therefore 
U(s\x\,X2, • • • , xl) is only a function of the type of (x\,X2, ■ ■ ■ ,xl). Let t denote this type. We now 
view U sym as containing channels U(s\t). Note that for y = we have 

£V(0|0,s)i7(«|t) =U(0\t) , (29) 

s 

and by the symmetry constraint we have 

£/(0|f) = t = 1,2,..., L . (30) 

Similarly, for y = a + 1 we have 

U(a\t) = t = 0,l,...,L-l . (31) 
Finally, for y = 1, 2, . . . , a we have 

J2w(y\0,s)U(s\t) = U(y\t) (32) 

s 

= Y J W(y\l,s)U(s\t-l) (33) 

s 

= U(y-l\t-l) y = l,2,...,a, t = l,2,...,L (34) 

The conditions (l30l) . (f3TT >. and (l34l) characterize the linear symmetry constraints in U sym . 
Thus for each input distribution P we can find 

/(P) = ^ ^ l(s)U(s\t) (^j P(0) L -'P(1)* . (35) 

This is a simple linear program. To calculate the strong L-symmetrizing cost, note that the set of all joint 
distributions P(x\ ) with marginals equal to P is also a convex set defined by linear equality constraints. 
If we let 

r(P,t)= Yl P( - x ^ > ( 36 ) 

xf:T x =t/L 

be the probability of a type-£ sequence under P, it is simple to numerically evaluate 

g(P) = max min ^ l(s)U(s\t)T(P, t) . (37) 

s,t 
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Randomized coding capacity and list-L capacity bounds for L = 2 




Jammer constraint A 

Fig. 1. Randomized coding capacity CV(A) and bounds on list-i capacity Cl(A) versus the state constraint A for L — 2. 

We calculated the achievable rates and converse bounds for a = 8, and the results are shown for list 
sizes L = 2 and L = 4 in Figures Q] and [2] For state constraint A, the randomized coding capacity 
C r (A) in ( f28T > is given by the dotted line. The achievable rate of Theorem [3] is shown by the solid line, 
and the converse bound of Theorem |2] by the dashed line. These two curves are given by restricting the 
optimization over P in the right side of (|28T ). 

When A = oo, the randomized coding capacity of this channel is given by (1251 ) and is 0.0597 
bits/channel use. Therefore, when A = oo, the result in (1261 ) shows that the the list-L capacity is 
for L < 8 and equal to 0.0597 for L > 8. That is, when the jammer is unconstrained, no positive rate is 
achievable under average error using list decoding with list size smaller than 8. However, from Figures Q] 
and [2] we can see that when A < oo we can achieve positive rates for list sizes L smaller than 8. However, 
for a range of A, the randomized coding capacity is achievable using lists of size 2 or 4. Figure Q] also 
illustrates another fundamental difference between list-decoding with state constraints and list-decoding 
without constraints: for a range around A = 3, the list-2 capacity (72(A) is positive but strictly smaller 
than the randomized coding capacity C r (A). 

In general, we conjecture that the converse region of Theorem[2]is not tight and that a stronger converse 
could be shown. The strong symmetrizing cost in (© allows optimization over all joint distributions with 
the same marginals. The converse proof uses a jamming strategy corresponding to taking a random set 
of L codewords from the codebook as inputs to a symmetrizing channel U{s\x L ) to generate the state 
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Randomized coding capacity and list-L capacity bounds for L = 4 




Jammer constraint A 

Fig. 2. Randomized coding capacity CV(A) and bounds on list-i capacity Cl(A) versus the state constraint A for L — 4. 

sequence. The strong symmetrizing cost is a conservative bound on the cost of such a strategy. It may 
be that techniques such as [?] could improve this bound; we leave this for future work. Our results 
here establish that the behavior of list-decoding for constrained AVCs is fundamentally different than the 
unconstrained case, much like the situation for list size 1. 

Appendix I 
Maximal Error 

Using now-standard typicality arguments we can show the existence of list-decodable codes for maximal 
error with exponential list size. The codebook is the entire set of typical sequences Tp and the list is 
the union of e-shells under the different state sequences. The decoder outputs a list that is the union of 
shells. Let 

W dep {P,k) = \v(y\x)-.V{y\x)=Y,W(y\x,s)U{s\x), U{s\x) € U(P, A) J . (38) 

Proof: [Proof of Theorem [Q The converse argument follows by choosing s according to the 
minimizing distribution U(s\x) in U(P,A). To show the achievable rate, without loss of generality, 
suppose that the distribution P maximizing Cd ep (A) is in V n {X) and consider the set Tp of all sequences 
of length n of type P (if not we can always approach the optimal P with large n). For any V(y\x) we 
define V'(x\y) from V(y\x)P(x) via the Bayes rule. The (V',e)-shell of typical x sequences around a 
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y is: 

Tf (y) = {x G T P : d max (T xy , V%) < e} . (39) 

Then 

-log|T£,(y)| < H v , Ty (X\Y) +0(eloge- 1 ) , (40) 
n 

where the subscript on H indicates the the joint distribution under which to take the mutual information. 
Now, for a fixed x G Tp and s with Z(s) < nA, we define an empirical forward channel 

V xs (y\x)=J2W(y\x,s)^^. (41) 

Note that V xs G Wdep(P, A). For a fixed received codeword y, define the set of channels consistent with 
y as: 

Vp(y) = ^ve w dep (P,A)np n (y\x) ■. d max ^F( y |x)p(x),r y ^ < jj . (42) 

Consider the set 

-4(y)= U # l+1|)5 (y)- (43) 

Standard typicality arguments show that if x generated y via some s satisfying the cost constraint, then 
with probability 1 — exp(— nE(6)), we have x G -4(y)- Furthermore: 

ilog|^(y)| < min H v{ylx)P(x) (X\Y) + 0(5 logd' 1 ) . (44) 

Note that we can view an encoding into all of Tp and decoding into A(y) as a list-decodable code 
with 2 nH ( p ^ codewords and list size (1441 . To arrive at the desired code we can sample a set B = {x(i)} 
of 2 n ( c ' do p( A ) _e ) codewords from this Tp uniformly at random and say the decoder outputs *4(y) PI B. 
We must show this set has at most L = 0(1/ e) codewords with high probability. 

Let R = Cdep(A) — e. For each y, the probability that any codeword of B is in -4(y) is upper bounded 
by 1-4^)1/1^1' so fr° m <EH> we see 

P(x(i) G A(y)) < exp (-n (C dcp (A) - 0(5log5- 1 ))) . (45) 

Since codewords are selected independently, we can bound the chance that a fraction L ■ 2~ nR of the 
2 nR codewords end up in -4(y) using Sanov's theorem [?, Theorem 12.4.1] 



F(\A(y) n B\ > L) < exp ( -2 nR D ( L2~ nR 2 - n( - u ^ A >- U[ - [oga ^> ) + h log(2 nrt + 1) ) (46) 



niCa^-OiSlogS- 1 ) \ j_ h i ntT ( nR 
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Now we can bound the term 2 nR D (• || •): 

-L log , x n , - 2 nR (l - L2~ nR ) log \ R -. n , (47) 

< -nL (e - 0(<5 log S' 1 )) - L log L + 2L . (48) 

We can pick 5 such that 0(<51og5 _1 ) < e/2 by choosing n sufficiently large. Then substituting (l48l ) in 
d46b , upper bounding R < log |3^|, and taking a union bound over all y we have: 

P(3y : \A(y) n B\ > L) < exp (-n (Le/2 + 2 log |^|) -LlogL + 2L) . (49) 

For sufficiently large n choosing L > |" 41og ^ ] makes the exponent negative, showing that with high 
probability the random selection will produce an (n, 2 nR , L) list-decodable code under maximal error 
whose error is bounded by 1 — exp(— nE(5)). ■ 

Appendix II 
Average Error 

A. Facts about symmetrizability 

The following theorem shows that if I{P) is positive, then L sym (P, A) is finite. In particular, since 
J(P*,A) is finite, the theorem implies that if C st d(A) > 0, then L sym (P*,A) < oo. The proof follows 
straightforwardly from the results of [?]. 

Lemma 1 (Finite symmetrizability): Let W be an arbitrarily varying channel with state cost function 

Z(-). If C std (A) = then L sym (P, A) = oo for all P. If C std (A) > then 

log( min(|y|,| «S|)) 
J(P,A) 

for all P such that I (P, A) > 0. 



U^A)< ^^ (50) 



B. Achievability under average error 

Given a P that is not weakly L-symmetrizable, we can use the coding scheme of Hughes [?] modified 
in the natural way suggested by Csiszar and Narayan [?] for list size 1. The codebook consists of N 
constant-composition codewords drawn uniformly from the codewords of type P. In order to describe 
the decoding rule we will use, we define the set 

0„(A) = {Pxsy £V(XxSxy):D (P XSY \\PxxP s xW)<rj, E[l(s)] < A} , (51) 
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where 

(P x xP s x W)(x,s,y) = P x (x)P s (s)W(y\x,s) . (52) 

The set ^(A) contains joint distributions which are close to those generated from the AVC W via 
independent inputs with distribution Px and Ps- 

Definition 1 (Decoding rule): Let xi, X2, . . . , x^r be a given codebook and suppose y was received. 
Let ip(y) denote the list decoded from y. Then put i G ip{y) if and only if there exists an s € <S n (A) 
such that 

1) T XiSy G g v (A), and 

2) for every set of L other distinct codewords {xj : j G J, J C [AT] \ {i}, | J\ = L} such that there 
exists a set {sj : Sj G <S n (A), j G J} with T XS y G {7jj(A) for all j G J we have 

1(71 A A L | 5) < rj , (53) 

where Pyxx L s is the joint type of (y,Xj, {x^ : j G J},s). 

An interpretation of this rule is that the decoder outputs a list of codewords {x,} each having a "good 
explanation" {sj}. A "good explanation" is a state sequence that plausibly could have generated the 
observed output y (condition 1) and makes all other L-tuples of codewords seem independent of the 
codeword and output (condition 2). The only thing to prove is that this decoding rule is unambiguous. 
The key is to show that no tuple of random variables (Y, X L+1 , S L+1 ) can satisfy the conditions of the 
decoding rule. This in turn shows that for sufficiently large n, no set of L + 1 codewords can satisfy 
the conditions of the decoding rule. Therefore, for sufficiently large blocklengths, the decoding rule will 
only output M or fewer codewords. 

Lemma 2: Let > 0, W be an AVC with state cost function /(■) and constraint A, P G V(X) with 
J(P, A) > and min x P(x) > (3, and M = L sym (P,A) + 1. For any a > and every collection of 
distributions {Ui G V{X M x S) : i = 1, 2, . . . , M} such that 

^ P{xi)U % (x^,s)l(s) < \ M {P) - a (54) 

x M + 1 ,s 

for all i = 1, 2, . . . , M + 1, there exists a £ > such that 



W{y\x h s)U i (xt I { % 1 ,s)P(x l ) - W(y\x j ,s)U j (x AI +l,s)P(x j : 



> C • (55) 



max > 

y,x M + 

Proof: Note that the outer sum in (1551 is over all x M+l . Define the function V k : X M+1 x S 



by: 

y fc (x M+1 ,s)=C4(*5+J, S ) • (56) 
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Let Iljvf+i be the set of all permutations of [M + 1] and for ir € 1Ia/+i let 7Tj be the image of i under 
7T. Then 



max > 



M + l 



max > 



2 W(y\x h s)V l {x M+ \s)P{x l ) - W(y\ Xj , s)V 3 {x M+ \ s)P( Xj ] 

s 

Y,W(y\x h a)V Vt (vr(x M+1 ), s)P( Xi ) 



-Y / W(y\x J ,s)V nj (7T(x M+1 ),s)Pi 

s 

We can lower bound this by averaging over all tt € U.m+1 '■ 



.1, I 



(57) 



max -— — — 

j& ^ (M + l)! ^ 



W(y\ Xi , s)V ni (tt(x m+1 ), s)P(x l ) 

s 

- £ W(y\ Xj , S )V 7Tj (tt(x m+1 ), S )P(x j ) 



(58) 



Define the average 



M+l 
-{i} ■■ 



(M 



7rgn M +i 

M+l 



(M + 



1=1 7r6lljkf + i:7ri=i 
M+l 



(M + l)! ^ ^ 

v y i=i o-en^ 

Note that V is a symmetric function for all s. 

Now we use the convexity of [ • | to pull the averaging inside the absolute value to get a further lower 
bound on (1581 ) by substituting in V. 



F{V, P) = max V 

y,x M + 



52w(y\xi,a)V(x*?$,8)P(i 



Y,W(y\x j ,s)V(x^,s)P(x j 



(59) 



The function F(V, P) is continuous function on the compact set of symmetric distributions {V} and the 
set of distributions P with min x . P(x) > (3, so it has a minimum £ = F(V*,P*) for some (V*,P*). We 
will prove that ( > by contradiction. 
Suppose F(V*,P*) = 0. Then 

YW(y\x i ,s)V*(x M ^,s)P%x i ) = Yw(y\x j ,s)V%x M +l,s)P*(x j ) . 
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So 

E E w ^ s ) v *t xM iy s ) p *^) = E E w (v\ x i> s)v*(x M ^s)p*( Xj ) 

y s y s 

V*{x M $)P*( Xi ) = V*(x M +l)P*( Xj ) , 
which implies (see [?, Lemma A3]) that for all j: 

V*{x M +\)P*{ Xj ) = P< M+l \x M+l ) . 

Therefore 

Y,W{y\x x ,s)V*{s\x^ +l ) . (60) 

s 

is symmetric in (x\, X2, ■ ■ ■ , xm+i)- Therefore V*(s\x2 I+1 ) £ U sym (M + 1). From the definition of 
Xm(P) in (ITTb we see that 

E V*(x M {l} ,s)P(x t )l(s) > Xm(P) ■ (61) 

x M + 1 ,s 

But from (l54l ). and the definition of V" we see that the {Ui} must be chosen such that 

E V*(x^ {i} ,s)P( Xi )l(s) < X M (P) - a . (62) 



Therefore we have a contradiction and the minimum ( of F(V, P) must be greater than 0. Equation (1551 ) 
follows. ■ 

The next lemma shows that for a sufficiently small choice of the threshold r\ in the decoding rule there 
are no random variables that can force the decoding rule to output a list that is too large. The proof 
follows from Lemma |2] in the same way as in [?]. 

Lemma 3: Let > 0, W be an AVC with state cost function /(•) and constraint A, P € V(X) with 
rmn-E P{x) > 0, and M = L sym (P, A) + 1. Then there exists an rj > sufficiently small such that no 
tuple of rv's (Y, X M+1 , S M+1 ) can simultaneously satisfy 

mmP(x)>p (63) 

X 

P Xi = P (64) 

Pyx ^ € Q V (A) (65) 

/ (YXi A X M ^ S^j < rj 1 < i < M + 1 (66) 
Proof: [Proof of Theorem |3l Given Lemma [3] the theorem follows from Lemma 3 of [?]. ■ 
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C. Converse 

The key idea in the converse is to show that for a codebook with codewords whose types are 
symmetrizable and close to a fixed symmetrizable type P, then the jammer has a strategy that keeps the 
error bounded away from 0. The rest follows from approximation and covering arguments. 

Lemma 4 (Approximating joint distributions): Let X be a finite set with \X\ > 2. For any e > 
and probability distribution P on X there exists a 5 > such that for any collection of distributions 
{Pi G V(X) : i G [L]} satisfying 

d iaax (P i ,P)<5 Vi (67) 
and any joint distribution P(x\,X2, ■ ■ ■ ,xl) with 

P(xi,x 2 , ■ ■ ■ ,xl) = Pi(xi) V£, Xi G X (68) 

there exists a joint distribution P(x\,X2, • • • , x£) such that 

P(xi,x 2 , • • • = Vi, Xi £ X (69) 

and 

dmax(P,P)<e. (70) 

Proof: [Proof of Lemma 01 Fix e > and P. We consider two cases depending on whether 
min^g^ P(x) = or not. 

Case 1. First suppose min xg ^ P(x) = j3 > 0. Consider a set of distributions {Pi : i G [L]} satisfying 
(l67l) and let -P(xf) be a joint distribution satisfying (l68l) . We treat probability distributions as vectors in 
Rl^. We can construct a distribution P satisfying d69l ) and ( |70l in two steps: first we project P onto the 
set of all vectors whose entries sum to 1 and satisfy d69l , and then we find a P close to this projection 
which is a proper probability distribution. 

Let B be the subspace of Ml^l 1 " of all vectors P' satisfying the marginal constraints (|69l as well as 
the sum probability constraint 

^P'(xf) = l. (71) 

xf 

We can summarize these linear constraints in the matrix form 

AP' = b' , (72) 
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where A contains the coefficients on the left-hand sides of the constraints (1691 ) and (1711 ) and b' has the 
right-hand sides. We can assume A has full row-rank by removing linearly dependent constraints. Note 
that the distribution P satisfies 

AP = b , (73) 



where b has the right-hand sides of (1681) instead of 

Now let P be the Euclidean projection of P onto the subspace B : 

P = P + A T {AA T )~ l (b' - AP) . (74) 

The error in the projection is 

P — P = A T (AA T )~ 1 (AP - b') (75) 

= A T (AA T )~ 1 (b - b') . (76) 

From (I67T ) we can see that all elements of (b — b') are in (—5,5). Since the rows of A are linearly 
independent, the singular values of A are strictly positive and a function of \X\ and L only. Therefore 
there is a function fi\(\X\,L) such that 

\\A T (AA T )- l (b-b , )\\ 2 <n 1 (\X\,L)-d . (77) 

Since \X\ is finite there is a function fi2(\X\,L) such that 



d max {P(xf),P(xi)) <fi 2 {\X\,L) -5 . (78) 

If the resulting P from this first projection has all nonnegative entries, then we set P = P and choose 
5 sufficiently small so that H2(\X\, L) ■ 5 < e. 

If P has entries that are not in [0, 1] then it is not a valid probability distribution. However, since P 
is a probability distribution, we know that 

mmP(xf)> -fi 2 (\X\,L)-5 . (79) 

Let P L be the joint distribution on X L with independent marginals P: 

P L (x 1 ,...,x L ) = P(x 1 )---P(x L ) . (80) 

Since min T P(x) > (5 we have P L {x[ ) > (3 L for all L. Let 

»2(\X\,L)-5 

« = p , (81) 
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and set 



P = (1 -a)P + aP L . (82) 



Then P(x^ ) > for all x\ and by the triangle inequality: 

rfmax (P, P) < dmax (P, p) + ^ max (P, p) (83) 

</i 2 (|*|,L)-<5 + ad max (p,P 1 ') (84) 

<( 1 + ^(I4£)-^ (85) 

Therefore for 5 sufficiently small, we can choose a P such that d max LP, P^j < e for any e > 0. 

Case 2. We turn now to the second case. Suppose that m.in x& ^ P(x) = 0. Let Xq = {x £ X : P(x) = 
0} and Z = X \ Xq. Let Q € V(Z) be the restriction of P to Z. Then Q is a probability distribution on 
Z. First suppose that \Z\ = 1. Then P(x) = 1 for some x G X. Let 

P(xf) = P(xi)---P(x L ) . (86) 

Since all the marginal distributions Pj of P satisfy d max (P, Pj) < 5 we know that d max (p, p\ < 6. 

Now suppose \Z\ > 2. We can construct P by first finding a a joint distribution Q that is close to P 
and then invoking the first case of this proof on Q. From (f6Tb we know that for some c > we have 



P(xi,x 2 , . . . ,x L ) = c<5 (87) 
< |^| L 5 . (88) 



Define Q by 



, P(xf ) + \Z\- L c5 xf G Z L 

Q(^)= r (89) 

xf £ Z L 

Since Q has support only on Z L we can think of it either as a distribution on X L or on Z L . Note that 

dma x (P,0) <C5 . (90) 

Let {Qi : i £ [L]} be the i-th marginal distributions of Q: 

Qi{xi)= ^2 Q( x l, x 2, ■■■ ,xl) = Qi{xi) Vi, Xj G Z . (91) 
Then we have for some c' > 

dmax(Q,Q 4 ) <C'J . (92) 
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Now we can apply Case 1 of this proof using the set Z and distributions Q, {Qi}, and Q. For any 
ei > we can find a Si > such that if {Qi} satisfy 

dmax (Q, Qi) < h , (93) 
then there exists a Q with marginals equal to Q such that 

d max (Q,Q) <ei ■ (94) 



Let P be the extension of Q to a distribution on A^ 1, by setting -P(xf) = Q(xf ) for x\ G i? L and 
elsewhere. By the triangle inequality we have 

<W (P, 0) < rfmax (P, Q) + dmax (0, Q) (95) 

<cS + ei . (96) 

We can choose S sufficiently small so that Si and ei are sufficiently small to guarantee that this distance 
is less than e. ■ 
Lemma 5: Let W be an AVC with state cost function /(•) and constraint A and let L be a positive 
integer. Let e > be arbitrary and suppose P is a distribution with Xl(P) < A — e. Then there exists 
a S > and uq such that for any (n, N, L) list code with n > uq and N > L + 1 whose codewords 
{x(i) : i € [A/"]} satisfy 

d max (T x(i) ,P) < S Vie [A] (97) 
A L (T x(i) )<A-e Vie [A], (98) 
the average error for the code is lower bounded: 

max £l(s) > -r—r — , r/T — tt • (99) 
se s»(A) w L + l A(L + 1) 

Proof: From Lemma [4] we can see that for any ei > there exists a Si > such that for any set 
J C [A] of codewords with \ J\ = L and d max (T X ^^,P) < Si, we can find a joint type P € 'P(r ; f L ) 
with marginals equal to P such that the joint type T^j^ satisfies 

dmax(T x(J) ,P) <ei . (100) 

Now let U achieve the minimum in the definition of Xl(P). Since Xl(P) < A — e we have 

Y, l(s)U(s\xf)T x{J) (xi) < Y, l(s)U(s\xf)P(xf) + ei\*\X\ L (101) 

s,xf s,xf 

< A-e + eiA*|^| L , (102) 
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where A* = max se s l(s). Now choose e\ = e/(2\*\X\ L ) so that 

Y, l(s)U(s\x^T AJ) (xf) < A — e/2 , (103) 

and choose S = 5i according to Lemma 01 

The jammer will pick a J C [N] with | J\ = L uniformly from all such subsets and select its state 
sequence according to the random variable S(J) with distribution 

n 

Q n (s)=HU(s t \{x t (j):jeJ}) . (104) 
t=i 

The expected cost of S(J) is 



1 n 

— e[z(s(j))] = -VV z^N-Mi) : i G J}) (105) 

n n ^-^ 

t=l s 

= Y i W Wi, - - - , (106) 

= ^i( S )f/( S |xf)T x(J) (107) 

<A-e/2. (108) 
We can also bound the variance of /(S(J)): 

Var(/(S(J))) < ^£ . (109) 
Then Chebyshev's inequality gives the bound: 

S (U J ,J))>A)< w J^ 7m (110) 



4(A 



*\2 



<^~- (HI) 

We now need some properties of symmetrizing channels used with the random variables S( J). Firstly, 
we have: 

E[H^(y|x(i),S(J))] = ]Tw^(y|x(i),s)[/"(s|{x(j) : j E J}) (112) 

s 

= E[^"(y|x(j),S(J\{j}U {*}))] . (113) 
Using (11131 ) we can see that for some subset G C [N] with \G\ = L + 1: 

£EM*,S(G\ {<}))] = W X - £ E[^(y| Xi ,S(G\{i}))]] (114) 

iGG* ieG \ y:iGV(y) / 

= L + 1 "E E E[r(y|x io ,S GUlo} )] . (115) 

ieG y:ieV>(y) 
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Because each y can be decoded to a list of size at most L , we can get a lower bound 

^E[ £ (z,S GUi} )] >L + 1-L E[W n (y|x, 0) S GWo} )] 
ieG yey" 

= 1. (116) 

We can now begin to bound the probability of error for this jamming strategy. Let J be the set of all 
subsets of [N] of size L, and let J be a random variable uniformly distributed on J. We can write the 
expected error as 

1 1 N 

Ej,s(J) [£l(S(J))] = J] 5>[£L(i,S(J))] . (117) 



Then we have: 



Ej, S (%,j)[^(S(Dj,J))]>-i--i X E E fe( i ' S ( G \W))] • (118) 

\£/ Gc[iV]:|G|=L+lieG 

Now we can rewrite the inner sum using dl 131 ): 

f N ) 

Ej, S (j) fr(S(J))] > (119) 



AT-L 
Z/+1 



(i v ) • * 

N -L 



(L + 1)N 
1 L 



L+l iV(L + l 

Finally, we can add in the bound (111 II ) to obtain 

1 L 



L + l iV(L + l) 



Now, we can choose no large enough such that 



(120) 
(121) 
(122) 



< E JiS(J) [e L (S(J))] (123) 



< max e L (s)+P(Z(S(J)) > A) (124) 

sG5"(A) 

MX*) 2 

< max e L (s) V i . (125) 
se5"(A) ne^ 



1 L 

max £l(s) > — — — — . (126) 

se5-(A) w L + 2 N{L + 1) 
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Lemma 6: Let W be an AVC with state cost function £(•) and constraint A and let L be a positive 
integer. For any e > there exists a u(L,W, e) > and no such that for any (n, N, L) list code (</),ip) 
with n > no and N > L + 1 whose codewords {x(i) : i G [A]} satisfy 

A L (T x(i) )<A-e ViG [AT], (127) 

the error must satisfy 

max e L (s) > u(L,W,e) . (128) 
s€<S"(A) 

Proof: Fix e > 0. For each P G V(X) from Lemma @] we know there is a 5{P) > such that any 
joint distribution P with marginals within 5{P) of P can be approximated by a P with marginals equal 
to P such that d max ^P, P^j < e. Let 

= {P' G P(^) : d max (P, P') < . (129) 

Then {B{P) : P € V{X)} is an open cover of V{X). Since V(X) is compact there is a constant r and 
finite subcover {B(Pj) : j € [r]}. From this finite cover we can create a partition {Aj : j G [r]} of V 
such that Aj CB(Pj) for all j. 

Now consider an (n,N,L) code whose codewords C satisfy (1 1 27b - Let Pj = {i G [TV] : T x f{\ G Aj}. 
We can bound the error 



£L(i,s) • (130) 



Since {Pj} partition the codebook, for some j we have \Fj\ > N/r. From Lemma [5] the jammer can 
force the error to be lower bounded by 

max £l(s) > 4t ( — t— r I • (131) 

se5-(A) K '-r 2 \L + l N(L + 1)J K ' 

Since the constant r is a function of e, W and L, we are done. ■ 
Theorem [2] follows from the preceding Lemma. 
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